Eau Claire County
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Arcuschin, Ivรกn, Janiak, Jett, Krzyzanowski, Robert, Rajamanoharan, Senthooran, Nanda, Neel, Conmy, Arthur
Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that CoT reasoning is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions. So far, most of these studies have focused on unfaithfulness in unnatural contexts where an explicit bias has been introduced. In contrast, we show that unfaithful CoT can occur on realistic prompts with no artificial bias. Our results reveal non-negligible rates of several forms of unfaithful reasoning in frontier models: Sonnet 3.7 (16.3%), DeepSeek R1 (5.3%) and ChatGPT-4o (7.0%) all answer a notable proportion of question pairs unfaithfully. Specifically, we find that models rationalize their implicit biases in answers to binary questions ("implicit post-hoc rationalization"). For example, when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify answering Yes to both questions or No to both questions, despite such responses being logically contradictory. We also investigate restoration errors (Dziri et al., 2023), where models make and then silently correct errors in their reasoning, and unfaithful shortcuts, where models use clearly illogical reasoning to simplify solving problems in Putnam questions (a hard benchmark). Our findings raise challenges for AI safety work that relies on monitoring CoT to detect undesired behavior.
Gumbel Counterfactual Generation From Language Models
Ravfogel, Shauli, Svete, Anej, Snรฆbjarnarson, Vรฉsteinn, Cotterell, Ryan
Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation of linear subspaces tied to specific concepts -- to \emph{intervene} on these models. To understand the impact of interventions precisely, it is useful to examine counterfactuals -- e.g., how a given sentence would have appeared had it been generated by the model following a specific intervention. We highlight that counterfactual reasoning is conceptually distinct from interventions, as articulated in Pearl's causal hierarchy. Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as a structural equation model using the Gumbel-max trick, which we called Gumbel counterfactual generation. This reformulation allows us to model the joint distribution over original strings and their counterfactuals resulting from the same instantiation of the sampling noise. We develop an algorithm based on hindsight Gumbel sampling that allows us to infer the latent noise variables and generate counterfactuals of observed strings. Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.
Exploration and Evaluation of Bias in Cyberbullying Detection with Machine Learning
Root, Andrew, Jakubowski, Liam, Vanamala, Mounika
It is well known that the usefulness of a machine learning model is due to its ability to generalize to unseen data. This study uses three popular cyberbullying datasets to explore the effects of data, how it's collected, and how it's labeled, on the resulting machine learning models. The bias introduced from differing definitions of cyberbullying and from data collection is discussed in detail. An emphasis is made on the impact of dataset expansion methods, which utilize current data points to fetch and label new ones. Furthermore, explicit testing is performed to evaluate the ability of a model to generalize to unseen datasets through cross-dataset evaluation. As hypothesized, the models have a significant drop in the Macro F1 Score, with an average drop of 0.222. As such, this study effectively highlights the importance of dataset curation and cross-dataset testing for creating models with real-world applicability. The experiments and other code can be found at https://github.com/rootdrew27/cyberbullying-ml.
The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach
Jozani, Mohsen, Williams, Jason A., Aleroud, Ahmed, Bhagat, Sarbottam
This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed.
VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence
Bakshi, Mayank, Ghasvarianjahromi, Sara, Yakimenka, Yauhen, Beemer, Allison, Kosut, Oliver, Kliewer, Joerg
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence or convergence to an admissible consensus model in their presence. This contrasts sharply with the traditional byzantine-robustness requirement of convergence to an admissible consensus irrespective of the adversarial configuration. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. Machine learning is increasingly reliant on data from a variety of distributed sources. As such, it may be difficult to ensure that the data which originates from these sources is trustworthy. Thus, there is a need to develop distributed and decentralized learning strategies that can respond to bad or even malicious data. However, worst-case or Byzantine resilience is an extremely strong requirement, that performance be maintained if a malicious adversary controls a subset of the processing nodes and takes any conceivable action. In practice, an adversary launching such an attack against a learning process requires tremendous resources which may not be worth the cost to influence the learned model. Thus, even though malicious adversaries are a threat, for the vast majority of the time, they are not present. An algorithm that maintains Byzantine robustness necessarily sacrifices performance when no adversaries are present.
Recent Advancements In The Field Of Deepfake Detection
Krueger, Natalie, Vanamala, Dr. Mounika, Dave, Dr. Rushit
A deepfake is a photo or video of a person whose image has been digitally altered or partially replaced with an image of someone else. Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. There are many deepfake detection strategies currently in use but finding the most comprehensive and universal method is critical. So, in this survey we will address the problems of malicious deepfake creation and the lack of universal deepfake detection methods. Our objective is to survey and analyze a variety of current methods and advances in the field of deepfake detection.
RELDEC: Reinforcement Learning-Based Decoding of Moderate Length LDPC Codes
Habib, Salman, Beemer, Allison, Kliewer, Joerg
In this work we propose RELDEC, a novel approach for sequential decoding of moderate length low-density parity-check (LDPC) codes. The main idea behind RELDEC is that an optimized decoding policy is subsequently obtained via reinforcement learning based on a Markov decision process (MDP). In contrast to our previous work, where an agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in this work we train the agent to schedule all CNs in a cluster, and all clusters in every iteration. That is, in each learning step of RELDEC an agent learns to schedule CN clusters sequentially depending on a reward associated with the outcome of scheduling a particular cluster. We also modify the state space representation of the MDP, enabling RELDEC to be suitable for larger block length LDPC codes than those studied in our previous work. Furthermore, to address decoding under varying channel conditions, we propose agile meta-RELDEC (AM-RELDEC) that employs meta-reinforcement learning. The proposed RELDEC scheme significantly outperforms standard flooding and random sequential decoding for a variety of LDPC codes, including codes designed for 5G new radio.
Hybrid Deepfake Detection Utilizing MLP and LSTM
Mallet, Jacob, Krueger, Natalie, Vanamala, Mounika, Dave, Rushit
The growing reliance of society on social media for authentic information has done nothing but increase over the past years. This has only raised the potential consequences of the spread of misinformation. One of the growing methods in popularity is to deceive users using a deepfake. A deepfake is an invention that has come with the latest technological advancements, which enables nefarious online users to replace their face with a computer generated, synthetic face of numerous powerful members of society. Deepfake images and videos now provide the means to mimic important political and cultural figures to spread massive amounts of false information. Models that can detect these deepfakes to prevent the spread of misinformation are now of tremendous necessity. In this paper, we propose a new deepfake detection schema utilizing two deep learning algorithms: long short term memory and multilayer perceptron. We evaluate our model using a publicly available dataset named 140k Real and Fake Faces to detect images altered by a deepfake with accuracies achieved as high as 74.7%
Leveraging Deep Learning Approaches for Deepfake Detection: A Review
Tiwari, Aniruddha, Dave, Rushit, Vanamala, Mounika
Conspicuous progression in the field of machine learning and deep learning have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising it to the world got easy, calling for an efficacious countermeasure. Thus, one of the optimistic counter steps against deepfake would be deepfake detection. To undertake this threat, researchers in the past have created models to detect deepfakes based on ML/DL techniques like Convolutional Neural Networks. This paper aims to explore different methodologies with an intention to achieve a cost-effective model with a higher accuracy with different types of the datasets, which is to address the generalizability of the dataset.